10 research outputs found

    LocTree3 prediction of localization

    Get PDF
    The prediction of protein sub-cellular localization is an important step toward elucidating protein function. For each query protein sequence, LocTree2 applies machine learning (profile kernel SVM) to predict the native sub-cellular localization in 18 classes for eukaryotes, in six for bacteria and in three for archaea. The method outputs a score that reflects the reliability of each prediction. LocTree2 has performed on par with or better than any other state-of-the-art method. Here, we report the availability of LocTree3 as a public web server. The server includes the machine learning-based LocTree2 and improves over it through the addition of homology-based inference. Assessed on sequence-unique data, LocTree3 reached an 18-state accuracy Q18 = 80 ± 3% for eukaryotes and a six-state accuracy Q6 = 89 ± 4% for bacteria. The server accepts submissions ranging from single protein sequences to entire proteomes. Response time of the unloaded server is about 90 s for a 300-residue eukaryotic protein and a few hours for an entire eukaryotic proteome not considering the generation of the alignments. For over 1000 entirely sequenced organisms, the predictions are directly available as downloads. The web server is available at http://www.rostlab.org/services/loctree3

    The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

    Get PDF
    Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.Peer reviewe

    The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

    Get PDF
    BackgroundThe Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function.ResultsHere, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory.ConclusionWe conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.</p

    Predicted Molecular Effects of Sequence Variants Link to System Level of Disease

    No full text
    <div><p>Developments in experimental and computational biology are advancing our understanding of how protein sequence variation impacts molecular protein function. However, the leap from the micro level of molecular function to the macro level of the whole organism, <i>e</i>.<i>g</i>. disease, remains barred. Here, we present new results emphasizing earlier work that suggested some links from molecular function to disease. We focused on non-synonymous single nucleotide variants, also referred to as single amino acid variants (SAVs). Building upon OMIA (Online Mendelian Inheritance in Animals), we introduced a curated set of 117 disease-causing SAVs in animals. Methods optimized to capture effects upon molecular function often correctly predict human (OMIM) and animal (OMIA) Mendelian disease-causing variants. We also predicted effects of human disease-causing variants in the mouse model, <i>i</i>.<i>e</i>. we put OMIM SAVs into mouse orthologs. Overall, fewer variants were predicted with effect in the model organism than in the original organism. Our results, along with other recent studies, demonstrate that predictions of molecular effects capture some important aspects of disease. Thus, <i>in silico</i> methods focusing on the micro level of molecular function can help to understand the macro system level of disease.</p></div

    Predictions of SAV effects upon function and disease across species.

    No full text
    <p>The numbers above bars give the number of SAVs in the set. <b>A</b>: Three methods (SNAP2 [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005047#pcbi.1005047.ref016" target="_blank">16</a>], SIFT [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005047#pcbi.1005047.ref027" target="_blank">27</a>], PolyPhen-2 [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005047#pcbi.1005047.ref012" target="_blank">12</a>]) predicted SAV effects upon molecular function (TrEffect/TrNeutral) and upon disease (OMIM). Exclusively for this panel SNAP2 was trained without using disease SAVs from OMIM [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005047#pcbi.1005047.ref005" target="_blank">5</a>] or HumVar [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005047#pcbi.1005047.ref028" target="_blank">28</a>]. The SNAP2 version trained exclusively on molecular function clearly captured aspects of OMIM-disease SAVs (leftmost bar OMIM higher than 2<sup>nd</sup> to the left TrEffect). TrNeutral was the SNAP2 training set of variants without effect. Comparing the bars for TrNeutral and OMIM for each method pointed to differential thresholds: Polyphen-2 correctly predicted more effect in OMIM than SNAP2 but also incorrectly predicted more effect in the neutral data, <i>i</i>.<i>e</i>. simply predicted more effect variants. <b>B:</b> OMIM is repeated from A. SNAP2 captured disease signals in humans and animals at similar levels. OMIA contained disease SAVs from animals other than mouse and rat (mostly dog and cattle). <b>C:</b> SNAP2 predicted OMIM SAVs with less effect in mouse orthologs than in human. Left bar (<i>OMIM with mouse ortholog</i>): SNAP2 predictions for the subset of all 4,229 OMIM SAVs for which we found a mouse ortholog. Right bar (<i>OMIM in mouse</i>): SNAP2 predictions when putting the human SAV into the mouse sequence. <b>D:</b> Disease variants happen in non-random positions. Left bar (<i>NotOMIM conserved</i>): in each protein with an OMIM SAV, we predicted the effect of all SAVs with a level of sequence conservation ≥ that of the OMIM variant. Right bar (<i>NotOMIM not conserved</i>): predictions for SAVs in non-OMIM positions with conservation < that of the OMIM SAV. Obviously, OMIM SAVs were very well conserved.</p

    Machine learning reveals genetic modifiers of the immune microenvironment of cancer

    No full text
    Summary: Heritability in the immune tumor microenvironment (iTME) has been widely observed yet remains largely uncharacterized. Here, we developed a machine learning approach to map iTME modifiers within loci from genome-wide association studies (GWASs) for breast cancer (BrCa) incidence. A random forest model was trained on a positive set of immune-oncology (I-O) targets, and then used to assign I-O target probability scores to 1,362 candidate genes in linkage disequilibrium with 155 BrCa GWAS loci. Cluster analysis of the most probable candidates revealed two subfamilies of genes related to effector functions and adaptive immune responses, suggesting that iTME modifiers impact multiple aspects of anticancer immunity. Two of the top ranking BrCa candidates, LSP1 and TLR1, were orthogonally validated as iTME modifiers using BrCa patient biopsies and comparative mapping studies, respectively. Collectively, these data demonstrate a robust and flexible framework for functionally fine-mapping GWAS risk loci to identify translatable therapeutic targets

    Glucoside

    No full text
    corecore